Goto

Collaborating Authors

 subspace projection



Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections

Neural Information Processing Systems

Computing optimal transport (OT) between measures in high dimensions is doomed by the curse of dimensionality. A popular approach to avoid this curse is to project input measures on lower-dimensional subspaces (1D lines in the case of sliced Wasserstein distances), solve the OT problem between these reduced measures, and settle for the Wasserstein distance between these reductions, rather than that between the original measures. This approach is however difficult to extend to the case in which one wants to compute an OT map (a Monge map) between the original measures. Since computations are carried out on lower-dimensional projections, classical map estimation techniques can only produce maps operating in these reduced dimensions. We propose in this work two methods to extrapolate, from an transport map that is optimal on a subspace, one that is nearly optimal in the entire space. We prove that the best optimal transport plan that takes such subspace detours is a generalization of the Knothe-Rosenblatt transport. We show that these plans can be explicitly formulated when comparing Gaussian measures (between which the Wasserstein distance is commonly referred to as the Bures or Fréchet distance). We provide an algorithm to select optimal subspaces given pairs of Gaussian measures, and study scenarios in which that mediating subspace can be selected using prior information. We consider applications to semantic mediation between elliptic word embeddings and domain adaptation with Gaussian mixture models.


SAFA-SNN: Sparsity-Aware On-Device Few-Shot Class-Incremental Learning with Fast-Adaptive Structure of Spiking Neural Network

Zhang, Huijing, Cao, Muyang, Jiang, Linshan, Du, Xin, Yu, Di, Lv, Changze, Deng, Shuiguang

arXiv.org Artificial Intelligence

Continuous learning of novel classes is crucial for edge devices to preserve data privacy and maintain reliable performance in dynamic environments. However, the scenario becomes particularly challenging when data samples are insufficient, requiring on-device few-shot class-incremental learning (FSCIL) to maintain consistent model performance. Although existing work has explored parameter-efficient FSCIL frameworks based on artificial neural networks (ANNs), their deployment is still fundamentally constrained by limited device resources. Inspired by neural mechanisms, Spiking neural networks (SNNs) process spatiotemporal information efficiently, offering lower energy consumption, greater biological plausibility, and compatibility with neuromorphic hardware than ANNs. In this work, we present an SNN-based method for On-Device FSCIL, i.e., Sparsity-Aware and Fast Adaptive SNN (SAFA-SNN). We first propose sparsity-conditioned neuronal dynamics, in which most neurons remain stable while a subset stays active, thereby mitigating catastrophic forgetting. To further cope with spike non-differentiability in gradient estimation, we employ zeroth-order optimization. Moreover, during incremental learning sessions, we enhance the discriminability of new classes through subspace projection, which alleviates overfitting to novel classes. Extensive experiments conducted on two standard benchmark datasets (CIFAR100 and Mini-ImageNet) and three neuromorphic datasets (CIFAR-10-DVS, DVS128gesture, and N-Caltech101) demonstrate that SAFA-SNN outperforms baseline methods, specifically achieving at least 4.01% improvement at the last incremental session on Mini-ImageNet and 20% lower energy cost over baseline methods with practical implementation.



An Efficient Subspace Algorithm for Federated Learning on Heterogeneous Data

Zhang, Jiaojiao, Xu, Yuqi, Yuan, Kun

arXiv.org Artificial Intelligence

This work addresses the key challenges of applying federated learning to large-scale deep neural networks, particularly the issue of client drift due to data heterogeneity across clients and the high costs of communication, computation, and memory. We propose FedSub, an efficient subspace algorithm for federated learning on heterogeneous data. Specifically, FedSub utilizes subspace projection to guarantee local updates of each client within low-dimensional subspaces, thereby reducing communication, computation, and memory costs. Additionally, it incorporates low-dimensional dual variables to mitigate client drift. We provide convergence analysis that reveals the impact of key factors such as step size and subspace projection matrices on convergence. Experimental results demonstrate its efficiency.


Reviews: Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections

Neural Information Processing Systems

Post-response comments (from discussion): "I feel the response did a good job of answering points of confusion, and also added an interesting example application of color transfer. This last example/experiment is heartening, because they finally use one of their maps (MK) in an application instead of just using the distances. I would be quite interested to see a more comprehensive exploration of this, as well as further applications of the MI and MK maps (perhaps in domain adaptation or Waddington-OT, which they mention elsewhere?). It's also important to note that they included a new algorithm for subspace selection which performs projected gradient descent on the basis vectors of the subspace, which outperforms their old method in the synthetic cases. This is a nice discovery, but I think will necessitate more than a minor structural change to the paper. One would expect a more complete presentation of the algorithm, including discussion of convergence (to local optima at least?) and runtime. It would also be nice to find a non-synthetic use case for this subspace selection, if possible. In light of their response, I feel that this paper is on the right track, but could use another iteration to better argue for applicability of their maps, and to update their algorithm for subspace selection."


Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections

Neural Information Processing Systems

Computing optimal transport (OT) between measures in high dimensions is doomed by the curse of dimensionality. A popular approach to avoid this curse is to project input measures on lower-dimensional subspaces (1D lines in the case of sliced Wasserstein distances), solve the OT problem between these reduced measures, and settle for the Wasserstein distance between these reductions, rather than that between the original measures. This approach is however difficult to extend to the case in which one wants to compute an OT map (a Monge map) between the original measures. Since computations are carried out on lower-dimensional projections, classical map estimation techniques can only produce maps operating in these reduced dimensions. We propose in this work two methods to extrapolate, from an transport map that is optimal on a subspace, one that is nearly optimal in the entire space.


Projection Robust Wasserstein Distance and Riemannian Optimization

Lin, Tianyi, Fan, Chenyou, Ho, Nhat, Cuturi, Marco, Jordan, Michael I.

arXiv.org Machine Learning

Projection robust Wasserstein (PRW) distance, or Wasserstein projection pursuit (WPP), is a robust variant of the Wasserstein distance. Recent work suggests that this quantity is more robust than the standard Wasserstein distance, in particular when comparing probability measures in high-dimensions. However, it is ruled out for practical application because the optimization model is essentially non-convex and non-smooth which makes the computation intractable. Our contribution in this paper is to revisit the original motivation behind WPP/PRW, but take the hard route of showing that, despite its non-convexity and lack of nonsmoothness, and even despite some hardness results proved by~\citet{Niles-2019-Estimation} in a minimax sense, the original formulation for PRW/WPP \textit{can} be efficiently computed in practice using Riemannian optimization, yielding in relevant cases better behavior than its convex relaxation. More specifically, we provide three simple algorithms with solid theoretical guarantee on their complexity bound (one in the appendix), and demonstrate their effectiveness and efficiency by conducing extensive experiments on synthetic and real data. This paper provides a first step into a computational theory of the PRW distance and provides the links between optimal transport and Riemannian optimization.


Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections

Muzellec, Boris, Cuturi, Marco

Neural Information Processing Systems

Computing optimal transport (OT) between measures in high dimensions is doomed by the curse of dimensionality. A popular approach to avoid this curse is to project input measures on lower-dimensional subspaces (1D lines in the case of sliced Wasserstein distances), solve the OT problem between these reduced measures, and settle for the Wasserstein distance between these reductions, rather than that between the original measures. This approach is however difficult to extend to the case in which one wants to compute an OT map (a Monge map) between the original measures. Since computations are carried out on lower-dimensional projections, classical map estimation techniques can only produce maps operating in these reduced dimensions. We propose in this work two methods to extrapolate, from an transport map that is optimal on a subspace, one that is nearly optimal in the entire space.


A Fast deflation Method for Sparse Principal Component Analysis via Subspace Projections

Xu, Cong, Yang, Min, Zhang, Jin

arXiv.org Machine Learning

Given a data set, PCA aims at finding a sequence of orthogonal vectors that repr esent the directions of largest variance. By capturing these directions, the princ ipal components offer a way to compress the data with minimum information loss. However, principal components are usually linear combinations of all original features. That is, the weights in the linear combinations (known as loadings) are typically nonzero. I n this sense, it is difficult to give a good physical interpretation. During the past decade, various sparse principal component analysis (SPCA) approaches have been developed to improve the interpretabili ty of principal components. SPCA is an extension of PCA that aims at finding sparse loading vectors capturing the maximum amount of variance in the data. These SPCA methods ca n be categorized into two groups: block methods [16,20,22-24,32] and deflati on methods [5,7,25,28]. Block methods aims to find all sparse loadings together, whil e deflation methods compute one loading at a time.